Optimising Sentiment Classification using Preprocessing Techniques

نویسندگان

  • Kranti Vithal Ghag
  • Ketan Shah
چکیده

Sentiment Classification refers to the computational techniques for classifying whether the sentiments of text are positive or negative. Sentiment Classification being a specialized domain of text mining is expected to benefit after preprocessing. In this paper we propose various models with selective combinations of preprocessing techniques and Sentiment Classifiers, to optimize Sentiment Classification. Unlike traditional preprocessing technique where punctuation symbols are discarded, we proposed a set of rules to handle words with apostrophe and then remove punctuation symbols. Sentiment Classifiers that were proposed in our previous research articles are based on term weighting techniques. We evaluated Sentiment Classification models by comparing them with state of art techniques using the movie sentence and movie document dataset. Accuracy increased from unprocessed dataset to preprocessed data. Our Classifiers handled stopwords thus had hardly any impact of stopwords removal in preprocessing unlike traditional Sentiment Classifiers. Our classifiers also displayed accuracy better than traditional classifier and another surveyed classifier based on term weighting technique. Keywords— Sentiment Classification; Pre-processing; Term Weighting; Term Frequency; Term Presence; Document Vectors

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sentiment Analisis on Web-based Reviews using Data Mining and Support Vector Machine

This work aims to use sentiment analysis techniques, data mining, text mining and natural language processing to indicate the polarity of texts using support vector machine. Weka software and a movie review database from Internet Movie Database IMDb were used. This work uses preprocessing filters and WRAPPER techniques and Support Vector Machine (SVM) for classification. It presents better resu...

متن کامل

A High-Performance Model based on Ensembles for Twitter Sentiment Classification

Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...

متن کامل

Opinion Analysis on Web-based Reviews Using Support Vector Machine

This work aims to use sentiment analysis techniques, data mining, text mining and natural language processing to indicate the polarity of texts using SVM (support vector machine). Weka software and a movie review database from IMDb (internet movie database) were used. This work uses preprocessing filters and WRAPPER techniques and SVM for classification. It presents better results when compared...

متن کامل

Discrimination of Golab apple storage time using acoustic impulse response and LDA and QDA discriminant analysis techniques

ABSTRACT- Firmness is one of the most important quality indicators for apple fruits, which is highly correlated with the storage time. The acoustic impulse response technique is one of the most commonly used nondestructive detection methods for evaluating apple firmness. This paper presents a non-destructive method for classification of Iranian apple (Malus domestica Borkh. cv. Golab) according...

متن کامل

A Comparison between Preprocessing Techniques for Sentiment Analysis in Twitter

In recent years, Sentiment Analysis has become one of the most interesting topics in AI research due to its promising commercial benefits. An important step in a Sentiment Analysis system for text mining is the preprocessing phase, but it is often underestimated and not extensively covered in literature. In this work, our aim is to highlight the importance of preprocessing techniques and show h...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015